
version 13.0
capture log close
set more 1
log using annuityanalyses.log, replace
clear
set linesize 140

// ******************************  AnnuityAnalyses.do  ***********************************
**
** This program performs all the analyses for the paper:
** 
**      Behavioral Impediments to Valuing Annuities: 
**            Complexity and Choice Bracketing
** 
** to be published in the Review of Economics and Statistics in 2020
** 
**                           by
**       Jeffrey R. Brown, Arie Kapteyn, Erzo F.P. Luttmer, 
**             Olivia S. Mitchell, and Anya Samek
**
** This program uses two datasets:
** 1. UAS49_WithoutID -- Data from wave 49 of the Understanding America Study (UAS)
**                       augmented with select variables from other waves of 
**                       of the UAS. Variables such as UasID, state of residence,
**                       and other variables that could help identify the respondents
**                       are suppressed.
**
**                       IMPORTANT: If you use this UAS data, kindly email your 
**                                  full name and affiliation to uas-l@usc.edu 
**                                  so that the administrators of the Understanding 
**                                  America Study can keep track of those who use 
**                                  their data.
**
**                       You can access the same data but including identifiers
**                       (which allows you to merge the data to any other wave in
**                       the UAS) by setting up an account with the UAS at
**                       uasdata.usc.edu
**
** 2. cps_ASEC2016.dat - CPS abstract with basic demographics obtained from 
**                       the IPUMScps (http://cps.ipums.org/cps/)
**
**                       Year: 2016 Annual Social and Economic Supplement (2016 used to match 
**                            timing of UAS sample)
**                       Age selection: 18+
**
**
**
** This program produces the following output:
**
** 1. AnnuityAnalyses.log -- Log file with the output for all the tables.
**
**                    To search for a table, search for "Table #" for 
**                         normal tables or "Table A##" for appendix tables
**                    To search for a figure, search for "Figure #" for 
**                         normal figures or "Figure A#" for appendix figures
**                    To serach for a text claim, search "Text Claim ##" 
**                         where ## is the determined by the order in which they 
**                         appear in the text, or alternatively, simply search on
**                         a snipped of the text in the paper than contains an
**                         empirical result that is not in the tables.
**
** 2. Excel files with the data for the figures:
**
**                    rawdata_fig1a_midbuy.xls   - Figure 1, CDF of buy values
**                    rawdata_fig1b_midsell.xls  - Figure 1, CDF of sell values
**                    rawdata_fig2_logdiff.xls   - Figure 2, CDF of log(sell/buy)
**                    rawdata_figA1a_midbuy.xls  - Appendix Figure A1, CDF of buy values
**                    rawdata_figA1b_midsell.xls - Appendix Figure A1, CDF of sell values
**                    rawdata_figA2_spread.xls   - Appendix Figure A2, CDF of spread
**                 
// ***************************************************************************************







// ***************************************************************************************
// ***************************************************************************************
//
//  PART 1: Data cleaning and definitions of varialbes
//
// ***************************************************************************************
// ***************************************************************************************



** Data from Wave49 of the Understanding America Study
**
** This wave contained our experiment.
**
**
** IMPORTANT: If you use this UAS data, kindly email your 
**            full name and affiliation to uas-l@usc.edu 
**            so that the administrators of the Understanding 
**            America Study can keep track of those who use 
**            their data.
**
use uas49_withoutid





// ***************************************************************************************
//            Label, check, and name the primary treatment variables 
// ***************************************************************************************

** give variables that contain the randomizations more intuitive names
** -------------------------------------------------------------------
rename randomizer_advice_ssb          ss_benefit
rename randomizer_advice_lsstartvalue ls_startvalue
rename randomizer_name                vignette_name
rename randomizer_advice_intro        complexity
rename ed_001                         test_question1
rename ed_002                         test_question2

**** Create Binary Variables with "Yes" corresponding to 1 and "No" corresponding to 0
**** with name that makes clear what 1 corresponds to
**** All newly generated variables are also labeled and checked to ensure no missing values

** define 0 as No and 1 as Yes
label define noyes 0 "No" 1 "Yes"			

** 1 means person received consequence message treatment
gen byte  consequence = 2 - randomizer_education		
label var consequence "Consequence Treatment"
label val consequence noyes
assert    consequence < .
drop randomizer_education						

** 1 means Lump Sum amount was mentioned first
gen byte  ls_first  = 2 - randomizer_advice_answer_order	
label var ls_first "LS option mentioned first"
label val ls_first noyes
assert    ls_first < . 
drop randomizer_advice_answer_order

** 1 means person was reminded of quick spend down consequences first when receiving message
** It is defined for everyone but only affected those receiving the consequence message
gen byte  quick_first = 2 - randomizer_education_block
label var quick_first "Conseq. msg.: quick spend down mentioned first"
label val quick_first noyes
assert    quick_first < .
drop randomizer_education_block

** 1 means sell question was asked first (0 means buy question was asked first)
gen byte  sell_first = 2 - randomizer_advice_order
label var sell_first "Sell question first, then buy question"
label val sell_first noyes
assert    sell_first < .
drop randomizer_advice_order


**** Create dummies for each randomization variable that is not binary ****
**** Also check for missing values and label newly created dummies     ****
** ------------------------------------------------------------------------

** starting lump sum value proposed in exchange for annuity increase/decrease
tab       ls_startvalue, gen(ls_startvalue_)
assert    ls_startvalue < .
label var ls_startvalue_1 "LS low: 10k"		//starting LS value of $10,000
label var ls_startvalue_2 "LS medium: 20k" 	//starting LS value of $20,000
label var ls_startvalue_3 "LS high: 30k"	//starting LS value of $30,000

** name/gender assigned to primary vignette person
** (i.e., the persons to whom the respondent gives advice on buying/selling
**        the annuity)
tab       vignette_name, gen(vignette_name_)
assert    vignette_name < .
label var vignette_name_1 "Mr. Jones"
label var vignette_name_2 "Mrs. Jones"
label var vignette_name_3 "Mr. Smith"
label var vignette_name_4 "Mrs. Smith"

** type of complexity/no complexity added to vignette
tab       complexity, gen(complexity_)
assert    complexity < .
label var complexity_1 "No complexity: Narrow Spread"
label var complexity_2 "Complexity: Wide Spread"
label var complexity_3 "Complexity: Added Info"
 
** initial Social Security benefit of the vignette person
tab       ss_benefit, gen(ss_benefit_)
assert    ss_benefit < .
label var ss_benefit_1 "SSbenefit 800"		//initial SS benefit of $800/month
label var ss_benefit_2 "SSbenefit 1200"		//initial SS benefit of $1,200/month
label var ss_benefit_3 "SSbenefit 1600"		//initial SS benefit of $1,600/month
label var ss_benefit_4 "SSbenefit 2000"		//initial SS benefit of $2,000/month
 
** Also create Social Security benefits in hundreds of dollars
gen       ss_benefit100dollar = 4 + ss_benefit*4
label var ss_benefit100dollar "SS benefit (in $100)"
 
 
 
**********  Treatment variables derived from primary treatment variables        *************
**********	these variables are also labeled, defined, and checked for missing values********
** ------------------------------------------------------------------------------------------

 
** Indicator for any complexity treatment (either wide spread or added info)
gen       any_complexity = complexity==2 | complexity==3
label var any_complexity "Any Complexity"
assert    any_complexity < .
 
** The key treatments interacted
** 2x3 design, meaning different types of complexity are examined individually
gen     treat2x3 = 1 if complexity==1 & consequence==0	// no complexity          & no consequene msg
replace treat2x3 = 2 if complexity==2 & consequence==0	// wide spread complexity & no consequence msg
replace treat2x3 = 3 if complexity==3 & consequence==0	// added info complexity  & no consequence msg
replace treat2x3 = 4 if complexity==1 & consequence==1	// no complexity          & yes consequence msg
replace treat2x3 = 5 if complexity==2 & consequence==1	// wide spread complexity & yes consequence msg
replace treat2x3 = 6 if complexity==3 & consequence==1	// added info complexity  & yes consequence msg

** Label and define each of the 2x3 treatment interactions
label var treat2x3 "Main treatments, 2x3 design"
label def treat2x3  1 "Baseline"  ///
                    2 "Wide Spread; NO conseq. msg" 	///
                    3 "Added Info ; NO conseq. msg" 	///
                    4 "Narrow Spread; CONSEQ. msg"      ///
                    5 "Wide Spread  ; CONSEQ. msg"   	///
                    6 "Added Info   ; CONSEQ. msg"		
label val treat2x3 treat2x3                    

** Generate and label dummies for each of the 2x3 key treatments
tab treat2x3, gen(treat2x3_)
label var treat2x3_1 "Baseline"  
label var treat2x3_2 "Wide Spread; NO conseq. msg" 
label var treat2x3_3 "Added Info ; NO conseq. msg" 
label var treat2x3_4 "Narrow Spread; CONSEQ. msg"  
label var treat2x3_5 "Wide Spread  ; CONSEQ. msg"  
label var treat2x3_6 "Added Info   ; CONSEQ. msg"


** The key treatments interacted
** 2x2 design with any complexity instead of analyzing complexities separately
gen     treat2x2 = 1 if any_complexity==0 & consequence==0	// no complexity  & no consequence msg
replace treat2x2 = 2 if any_complexity==1 & consequence==0	// any complexity & no consequence msg
replace treat2x2 = 3 if any_complexity==0 & consequence==1	// no complexity  & yes consequence msg
replace treat2x2 = 4 if any_complexity==1 & consequence==1	// any complexity & yes consequence msg

** Label and define each of the 2x2 treatment interactions
label var treat2x2 "Main treatments, 2x2 design"   
label def treat2x2  1 "Baseline"       ///      
                    2 "Complexity; NO conseq. msg" 	///
                    3 "NO Complexity; CONSEQ. msg"  ///
                    4 "Complexity   ; CONSEQ. msg"                   
label val treat2x2 treat2x2   

** Generate and label dummy variables for each of the 2x2 treatments                 
tab treat2x2, gen(treat2x2_)
label var treat2x2_1 "Baseline"                 
label var treat2x2_2 "Complexity; NO conseq. msg" 
label var treat2x2_3 "NO Complexity; CONSEQ. msg" 
label var treat2x2_4 "Complexity   ; CONSEQ. msg"      



**************** Check and create the Sell / Buy variables ***********
** -------------------------------------------------------------------

** Rename variables measuring whether buy or sell offer was accepted
** and label all sell and buy variables using a foreach loop
foreach num of numlist 1/5{
  gen byte sell`num' = ad_001_`num' - 1
  drop ad_001_`num'
  
  gen byte buy`num' = ad_002_`num' - 1
  drop ad_002_`num'
  
  label variable sell`num' "sell`num'"
  label variable buy`num' "buy`num'"
}

** Label the sell and buy responses and the randomizations as noyes variables
*1 meaning buy/sell was accepted, 0 meaning buy/sell was rejected
label val sell1 sell2 sell3 sell4 sell5 buy1 buy2 buy3 buy4 buy5 noyes

** convert the offered lumpsum amounts into numererics (removing commas)
** creates sellprice'num' and buyprice'num' variables using a foreach loop
foreach num of numlist 1/5{
  destring flsellpayment_`num'_, ignore(",") replace
  gen sellprice`num'=flsellpayment_`num'_
  label variable sellprice`num' "sellprice`num'"
  
  destring flmakepayment_`num'_, ignore(",") replace
  gen buyprice`num'=flmakepayment_`num'_
  label variable buyprice`num' "buyprice`num'"
}

**** Check that the survey was implemented as we instructed ****
** 1. Check that starting value specified was used
assert sellprice1==10000 if ls_startvalue==1 & sellprice1<.
assert sellprice1==20000 if ls_startvalue==2 & sellprice1<.
assert sellprice1==30000 if ls_startvalue==3 & sellprice1<.

** 2. rejected sell prices should lead to question about a higher sell price next
assert sellprice2 > sellprice1 if sell1==0
assert sellprice3 > sellprice2 if sell2==0

* The survey instrument messed up for this one observation 
* because after rejecting $100k, the person should have been asked 
* about $200k, not about about 50k
* Perhaps the use of the backbutton induced this.
gen messedup = (sellprice4 <= sellprice3 & sell3==0)
count if messedup

* Same assertion as before but now we account for "messed up" observation
assert sellprice4 > sellprice3 if sell3==0 & ~messedup	
assert sellprice5 > sellprice4 if sell4==0 & ~messedup

** 3. accepted sell price should lead to question about a lower sell price next
assert sellprice2 < sellprice1 if sell1==1
assert sellprice3 < sellprice2 if sell2==1
assert sellprice4 < sellprice3 if sell3==1 
assert sellprice5 < sellprice4 if sell4==1 

** 4. rejected buy price should lead to question about a lower buy price next
assert buyprice2 < buyprice1 if buy1==0
assert buyprice3 < buyprice2 if buy2==0
assert buyprice4 < buyprice3 if buy3==0 
assert buyprice5 < buyprice4 if buy4==0 

** 5. accepted buy price should lead to question about a higher buy price next
assert buyprice2 > buyprice1 if buy1==1
assert buyprice3 > buyprice2 if buy2==1
assert buyprice4 > buyprice3 if buy3==1 
assert buyprice5 > buyprice4 if buy4==1 

** Determine the supremum and infimum of the buy and sell prices based on
** i)  (sup) the lowest sell price accepted and (inf) highest sell price rejected, and
** ii) (inf) the highest buy price accepted and (sup) the lowest buy price rejected.
**
** Topcode the sell & buy price at $million, which is twice the highest sell or buy price offered,
** and bottom code the sell and buy price at $0.
**
** This follows Brown, Kapteyn, Luttmer, Mitchell, JEEA 2017
**
** The rationale of this topcode is that if the distribution of valuations
** is a Pareto distribution with parameter 2 (as seems to be the case for many 
** valuation, income, and wealth distributions in the right tail), then
** the expected value conditional on exceeding the topcode is twice the topcode.
**
** Note: because of the somewhat arbrary nature of the topcode, the mean in levels
** is not that meaningful because it is sensitive to this topcode. 

* generate numbered variables for all of the accepted sell prices, 
* rejected sell prices, accepted buy prices, and rejected buy prices 
* using a foreach loop
foreach num of numlist 1/5{
gen sellpriceyes`num'=sellprice`num' if sell`num'==1
gen  sellpriceno`num'=sellprice`num' if sell`num'==0
gen  buypriceyes`num'= buyprice`num' if  buy`num'==1
gen   buypriceno`num'= buyprice`num' if  buy`num'==0
}

** First code the sell price
** Find the sup and inf sell prices using rowmin and rowmax functions
egen sellpricesup=rowmin(sellpriceyes1 sellpriceyes2 sellpriceyes3 sellpriceyes4 sellpriceyes5)
egen sellpriceinf=rowmax(sellpriceno1  sellpriceno2  sellpriceno3  sellpriceno4  sellpriceno5) 

** Label sup and inf sell variables
label variable sellpricesup "upper bound on selling price"
label variable sellpriceinf "lower bound on selling price"

** implement top and bottom coding
** Topcode sell price at $1,000,000 - twice the highest sell price offered
mvencode sellpricesup, mv(1000000)

** Bottomcode sell price at $0
mvencode sellpriceinf, mv(0)

** Create sellmissing variable to check for missing sell values
** If sell variable is missing or messed up, set sup and inf values to missing
gen sellmissing = missing(sell1, sell2, sell3, sell4, sell5)
replace sellpricesup = . if sellmissing | messedup
replace sellpriceinf = . if sellmissing | messedup

** Check to make sure there are no unintentional missing values
assert  sellpricesup < . if ~(sellmissing | messedup)
assert  sellpriceinf < . if ~(sellmissing | messedup)

** Second, code the buy price
** Find the inf and sup buy prices using rowmax and rowmin functions
egen buypriceinf=rowmax(buypriceyes1 buypriceyes2 buypriceyes3 buypriceyes4 buypriceyes5)
egen buypricesup=rowmin(buypriceno1  buypriceno2  buypriceno3  buypriceno4  buypriceno5) 

** Label sup and inf buy variables
label var buypricesup "upper bound on buying price"
label var buypriceinf "lower bound on buying price"

** implement top and bottom coding same as for sell prices
mvencode buypricesup, mv(1000000)
mvencode buypriceinf, mv(0)

** Create buymissing to check for missing buy values
** If buy variable is missing or messed up, set sup and inf values to missing
gen buymissing = missing(buy1, buy2, buy3, buy4, buy5)
replace buypricesup = . if buymissing | messedup
replace buypriceinf = . if buymissing | messedup

** Check to make sure no unintentional missing values
assert  buypricesup < . if ~(buymissing | messedup)
assert  buypriceinf < . if ~(buymissing | messedup)

** Define log-midpoints of annuity valuations
** and the spread variable, following the definition of BKLM(2017)
* Generate and label log-midpoints and spread variables
gen double midbuy         = (buypricesup +buypriceinf)/2	// avg of buy sup and inf
gen double midsell        = (sellpricesup+sellpriceinf)/2	// avg of sell sup and inf
gen double logbuyprice    = log(midbuy)						// log Buy			
gen double logsellprice   = log(midsell)					// log of midsell
gen double logspread      = abs(logsellprice-logbuyprice)	// abs diff between logsell and logbuy
gen double logsellbuydiff =     logsellprice-logbuyprice 	// same as logspread without abs
gen double meanlogprice   =    (logsellprice+logbuyprice)/2	// avg of logsell and logbuy price

label var midbuy         "Buy price (midpoint)"
label var midsell        "Sell price (midpoint)"
label var logbuyprice    "Log Buy"
label var logsellprice   "Log Sell"
label var logspread      "Log Spread" 
label var logsellbuydiff "Log Sell - Log Buy"
label var meanlogprice   "Mean of log sell and log buy price"

** Given that the vignette person has $100k in savings, they cannot not logically spend more that 100k
** on the annuity. Hence there is a logical (though not mechanical) topcode on the buy value.
** To make sure this is not driving anything, we create a spread variable where we put a mechanical topcode on
** both the buy and the sell value
gen logspread_100k = abs(log((min(100000,sellpricesup)+min(100000,sellpriceinf))/2)   ///
                       - log((min(100000, buypricesup)+min(100000, buypriceinf))/2))  if logspread<.





************ Generate and label the demographic control variables ***************
** ------------------------------------------------------------------------------
	
** dividing the age^2 by 100 keeps coefficients legible
gen agesq = age^2 / 100
label var agesq "Age squared divided by 100"
  
** age categories for tables with summary statistics
recode age (18/34=1 "18-34") (35/49=2 "35-49") (50/64=3 "50-64") (65/106=4 "65+"), gen(agecat)
gen agecat_18_34=agecat==1
gen agecat_35_49=agecat==2
gen agecat_50_64=agecat==3
gen agecat_65_plus=agecat==4
label var agecat "age categories"
label var agecat_18_34      "Age 18-34"   
label var agecat_35_49      "Age 35-49"   
label var agecat_50_64      "Age 50-64"   
label var agecat_65_plus    "Age 65+" 


  
  
** Generate female, married, and race/ethnicity categories
** Check to make sure no missing demographic values
** 1 means person is female, 
gen       female = 1 - gender 
label var female "Female"
label val female noyes

** 1 means person is married
gen married =  marital==1 | marital==2 if marital < .
label var married "Married"
label val married noyes

** 1 means person is non-hispanic white
gen       nhwhite =  race==1 & hisplatino==0 if ~missing(race, hisplatino)
label var nhwhite "Non-Hispanic White"
label val nhwhite noyes

** 1 means person is non-hispanic black
gen       nhblack =  race==2 & hisplatino==0 if ~missing(race, hisplatino)
label var nhblack "Non-Hispanic Black"
label val nhblack noyes

** 1 means person is hispanic and any race
gen       hispanic = hisplatino==1           if ~missing(race, hisplatino)
label var hispanic "Hispanic, Any Race"
label val hispanic noyes

** 1 means person is non-hispanic and not white or black
gen       nhother = (race==3 | race==4 | race==5 | race==6) & hisplatino==0 if ~missing(race, hisplatino)
label var nhother "Other Race/Ethnicity"
label val nhother noyes

** Check to make sure all respondents were placed into race/ethnicity groups
assert nhwhite+nhblack+hispanic+nhother==1 if ~missing(race, hisplatino) 

** Generate education categories
** Assign new values to education index variable to correspond with each level of education,
** and generate dummies to represent each of these levels
recode education min/8=1 9=2 10/12=3 13=4 14/max=5, gen(edu_ix)	

** Check to make sure all education level values are now 1-5
assert    edu_ix <= 5 & edu_ix>=1 if edu_ix<.

** Label education index variable and all of the dummies representing education levels
label var edu_ix "Education Index, 1-5 Scale"
label def edu_ix 1 "HS Dropout" 2 "High School" 3 "Some College" 4 "Bachelors" 5 "Graduate Degree" 
label val edu_ix edu_ix

** Generate new binary variables representing each education level
** and check for any missing values
gen byte ed_dropout = edu_ix==1 if edu_ix <.
gen byte ed_hschool = edu_ix==2 if edu_ix <.
gen byte ed_somecol = edu_ix==3 if edu_ix <.
gen byte ed_college = edu_ix==4 if edu_ix <.
gen byte ed_graduat = edu_ix==5 if edu_ix <.

** Label each of the new education variables
** and label them as noyes variables
label var ed_dropout  "HS Dropout"
label var ed_hschool  "High School"
label var ed_somecol  "Some College"
label var ed_college  "Bachelor's Degree"
label var ed_graduat  "Graduate Degree"

label val ed_dropout noyes
label val ed_hschool noyes
label val ed_somecol noyes
label val ed_college noyes
label val ed_graduat noyes

** Generate family income categories
** Assign new values (1-5) to hhincome variable to correspond with each increasing 
** level of household income. Generate dummies to represent each of these levels
recode hhincome min/7=1 8/11=2 12/13=3 14=4 15/16=5, gen(income_cat)

** Check to make sure all household income level values are now between 1-5
assert income_cat <= 5 & income_cat>=1 if income_cat<.

** Label household income variable and each of the 5 different income levels
label var income_cat "Income categories (1-5)"
label def income_cat 1 "HhInc <25k" 2 "HhInc 25-50k" 3 "HhInc 50-75k" 4 "HhInc 75-100k" 5 "HhInc >=100k"
label val income_cat income_cat

** Also use the more detailed HRS-level income variable
** Assign new values (1-5) to h12itot variable to correspond with each increasing 
** level of household income according to hrs data. Generate dummies for each of 
** these levels under name hrsincome_cat
recode h12itot min/24999 = 1 25000/49999=2 50000/74999=3 75000/99999=4 100000/max=5, gen(hrsincome_cat)

** Check to make sure all hrs household income level values are now 1-5 or missing
assert hrsincome_cat==1|hrsincome_cat==2|hrsincome_cat==3|hrsincome_cat==4|hrsincome_cat==5|missing(hrsincome_cat)

** Use the more detailed HRS income measure if available
** Replace income_cat data with all corresponding non-missing hrs data
replace income_cat = hrsincome_cat if ~missing(hrsincome_cat)

** Create dummies with more intuitive names for household income categories 
** also check for any missing values 
gen byte hinc_lt25   = income_cat==1 if income_cat <.
gen byte hinc_25_50  = income_cat==2 if income_cat <.
gen byte hinc_50_75  = income_cat==3 if income_cat <.
gen byte hinc_75_100 = income_cat==4 if income_cat <.
gen byte hinc_ge100  = income_cat==5 if income_cat <.

** Label new household income dummies
label var hinc_lt25   "Hh Income: Below 25k"
label var hinc_25_50  "Hh Income: 25k-50k"
label var hinc_50_75  "Hh Income: 50k-75k"
label var hinc_75_100 "Hh Income: 75k-100k"
label var hinc_ge100  "Hh Income: 100k or more"

** Generate and label household size variable & indicator for kids in the hh variable
gen byte hhsize=1
gen byte hhkids=0
label var hhsize "HH size counted from roster"
label var hhkids "# of kids in HH"

** Loop through all 10 (max) potential household members using foreach loop
** if that household member exists, add 1 to hhsize variable
** if they exist and are below 18 y.o., add 1 to hhkids
foreach num of numlist 1/10 {
	replace hhsize = hhsize + (hhmemberin_`num'==1)
	replace hhkids = hhkids + (hhmemberin_`num'==1)*(hhmemberage_`num'<18)
}
** Make sure hhsiz variable equals the reported household size plus the respondent (+1)
assert hhsize==hhmembernumber+1  if hhmembernumber <.

** Generate household size dummies for sizes of 1, 2, 3, and 4 or more persons
** including the respondent in this count
gen byte hhsiz_1  = hhsize==1 if hhsize <. 
gen byte hhsiz_2  = hhsize==2 if hhsize <.  
gen byte hhsiz_3  = hhsize==3 if hhsize <.  
gen byte hhsiz_4p = hhsize>=4 if hhsize <. 

** Make sure that all respondents' households are counted exactly once
assert hhsiz_1 + hhsiz_2 + hhsiz_3 + hhsiz_4p == 1

** Label dummies for household size including respondent 
label var hhsiz_1  "HhSize=1"
label var hhsiz_2  "HhSize=2"
label var hhsiz_3  "HhSize=3"
label var hhsiz_4p "HhSize>=4"

** Generate and label binary variable for any kids in the household and check for missing values
** 1 means there is at least 1 kid in house, 0 means no kids
gen anykids = hhkids >=1 if hhkids <.
label var anykids "Any kids present in HH"
label val anykids noyes





			
//  ***************************************************************************************
// 			RECODE AND RELABEL IN FINANCIAL LITERACY AND COGNITION DATA
//  ***************************************************************************************

** Code the standard Lusardi & Mitchell financial literacy questions
** -----------------------------------------------------------------

**  Suppose you had $100 in a savings account and the interest rate was 2% per year. After 5 years, how much will your money grow?
gen l001_correct = l001==1 if l001<. // More than $102 is correct answer
label var l001_correct "FinLit compounding 2% correct"

**  Suppose you had $100 in a savings account and the interest rate was 20% per year and you never withdrew money. 
**  After five years, how much would you have total?
gen l002_correct = l002==1 if l002<. 
label var l002_correct "FinLit compounding 20% correct"

** Imagine interest rate was 1% and inflation was 2%. After 1 year how much could you buy?
gen l003_correct = l003==3 if l003<. 
label var l003_correct "FinLit low inflation correct"

** Assume a friend inherits $10,000 today and sibling inherits $10,000 in 3 years from now. Who is richer today?
gen l004_correct = l004==1 if l004<.
label var l004_correct "FinLit discounting correct"

** Suppose that in the year 2020, your income has doubled and prices have doubled. How much can you buy?
gen l005_correct = l005==2 if l005<. 
label var l005_correct "FinLit high inflation correct"

** Which defines the functions of the stock market?
gen d001_correct = d001==3 if d001<. 
label var d001_correct "FinLit stock market description correct"
			
** Describe a mutual fund
gen d002_correct = d002==2 if d002<. 
label var d002_correct "FinLit mutual fund description correct"

** If interest rates rise/fall, what happens to bond prices? (Rise->bond prices fall; Fall->bond prices rise
gen p001_correct = ((p001_randomizer==1 & p001==2)|(p001_randomizer==2 & p001==1)) if p001<. 
label var p001_correct "FinLit bond price correct"

** Safety of purchasing single company or stock market fund? Buying single company/stock mutual provides safer return than single company/stock?
gen p002_correct = ((p002_randomizer==1 & p002==2)|(p002_randomizer==2 & p002==1)) if p002<. 
label var p002_correct "FinLit diversification correct"

** What is riskier, stocks or bonds? 
gen p003_correct = ((p003_randomizer==1 & p003==1)|(p003_randomizer==2 & p003==2)) if p003<. 
label var p003_correct "FinLit risk bonds vs stocks correct"

** Considering a long period, what normally gives the highest return?
gen p004_correct = p004==3 if p004<. 
label var p004_correct "FinLit risk long-run returns correct"

** Normally, which asset below displays the highest fluctuations over time?
gen p005_correct = p005==3 if p005<. 
label var p005_correct "FinLit risk asset fluctuations correct"

** When an investor spreads his money..does the risk..?
gen p006_correct = p006==2 if p006<. 
label var p006_correct "FinLit spreading money correct"

** Housing prices in the US can never go down?
gen p007_correct = p007==2 if p007<. 
label var p007_correct "FinLit housing prices correct"

** drop the underlying variables (not used anymore)
drop l001 l002 l003 l004 l005 d001 d002 p001_randomizer p001 p002 p002_randomizer p003 p003_randomizer p004 p005 p006 p007

** Generate variable to represent total # of correct FinLit answers
gen total_fin_lit = l001_correct + ///
                    l002_correct + ///
                    l003_correct + ///
                    l004_correct + ///
                    l005_correct + ///
                    d001_correct + ///
                    d002_correct + ///
                    p001_correct + ///
                    p002_correct + ///
                    p003_correct + ///
                    p004_correct + ///
                    p005_correct + ///
                    p006_correct + ///
                    p007_correct
             
label var total_fin_lit "FinLit questions correct 0-14"
 
** Calculate percentage of correct responses for each person
gen fin_lit_percent=total_fin_lit/14
label var fin_lit_percent "Fraction FinLit questions correct 0-1" 


** Code the numeracy and literacy cognition measures
** -------------------------------------------------

** relabel the IRT-based numeracy score (from uas1)
rename uas1cog cog_numbers1
label var cog_numbers1 "Cog: Numeracy Score"


** Rename number series test data (from uas42)
rename uas42cog cog_numbers2
label var cog_numbers2 "Cog: Number Series Score"


** Rename picture vocabulary test data (from uas43)
rename uas43cog cog_verbal1
label var cog_verbal1  "Cog: Picture Vocabulary Score"

** Rename verbal analogies test data (from uas44)
rename uas44cog cog_verbal2
label var cog_verbal2  "Cog: Verbal Analogies Score"

	
	
	
	
	

// *******************************************************************************************
// CREATE COGNITION INDEX (for ease of interpretation in regressions)
// *******************************************************************************************
			
** Generate indicator for all cognition measures being nonmissing 
gen cogn_nonmiss = ~missing(total_fin_lit, cog_numbers1, cog_numbers2, cog_verbal1, cog_verbal2)

** Standardize the cognition subscores so that they are comparable
egen cog_fin = std(total_fin_lit)     if cogn_nonmiss
egen cog_n1  = std(cog_numbers1)      if cogn_nonmiss
egen cog_n2  = std(cog_numbers2)      if cogn_nonmiss
egen cog_v1  = std(cog_verbal1)       if cogn_nonmiss
egen cog_v2  = std(cog_verbal2)       if cogn_nonmiss

** Label standardized cognition subscore variables
label var cog_fin "Financial literacy (standardized)"
label var cog_n1 "Cog: Numeracy (standardized)"
label var cog_n2 "Cog: Number series (standardized)"
label var cog_v1 "Cog: Picture vocabulary (standardized)"
label var cog_v2 "Cog: Verbal analogies (standardized)"

** Create the cognition index as the first principal component of the subscores
** and standardize it
pca cog_fin cog_n1 cog_n2 cog_v1 cog_v2   if cogn_nonmiss
predict tmp                               if cogn_nonmiss
egen cognix_pca=std(tmp)
drop tmp
label var cognix_pca "Principal Comp. Cognition Score (standardized)"
sum  cognix_pca                           if cogn_nonmiss, d
			
** Create quartiles of cognition by which to split the sample
xtile cognix_xtile4 = cognix if ~missing(cognix), nq(4)
tab cognix_xtile4, m
label var cognix_xtile4 "Quartiles of the PCA cognition index"

			
**  As a check: simple average (or sum) of all standardized scores from tests (finlit and cognition), standardized 
gen tmp = cog_fin+cog_n1+cog_n2+cog_v1+cog_v2 if cogn_nonmiss
egen cognix_avg = std(tmp)
drop tmp
label var cognix_avg "Average Cognition Score (standardized)"
sum  cognix_avg                          if cogn_nonmiss, d
			
			
	
	
			
			
			
// *******************************************************************************************
// CREATE EXTRA DATA VARIABLES JUST FOR APPENDIX TABLES 8-10
// *******************************************************************************************
** The data below is used only for specification 7 of Appendix 
** Table 8, Appendix Table 9, and Appendix Table 10
** All these variables relate to cognition or familiarity with financial instruments


** Score the Adult Decision Making Competence (ADMC) measures on 3 dimensions, following 
** Sinayev and Peters, 2015.  Then we do again following Bruine(2007)
** -------------------------------------------------------------------------------------

** Scoring of Consistency in Risk Perception 
** Consistency in Risk Perception assesses the ability to follow probability rules. 
gen time_conjunct=0 if !missing(admc1)
foreach x of numlist 13 14 16 17 19/22 { //taking the "in five years" var and comparing to the relevant "next year" var
local y=`x'-10 //for each five year variable, the associated "next year" is 10 under, 3 and 13, 4 and 14, 6 and 16...
replace time_conjunct=time_conjunct+1 if admc`y'>admc`x' //add 1 to score if greater probability assigned to next year event
}

** Scoring of Framing Inconsistency
** complementary events (should add to 100% probability) are added, to see if they're within 10 points of 100
gen framing_inconsistency=0 if !missing(admc1)
gen framing_inconsistency_bruine=0 if !missing(admc1)
gen add1=admc3+admc12 //3 and 12 are complementary events, so here we're adding them together
gen add2=admc7 + admc10 //7 and 10 are complementary events, so here we're adding them together
gen add3=admc13 + admc22 //13 and 22 are complementary events, so here we're adding them together
gen add4=admc17 + admc20 //17 and 20 are complementary events, so here we're adding them together

foreach x of numlist 1/4 {
replace framing_inconsistency=framing_inconsistency+1 if add`x'<90 | add`x'>110 //add 1 to score if not within window around 100
replace framing_inconsistency_bruine=framing_inconsistency_bruine+1 if add`x'<100 | add`x'>100 //add 1 to score if not 100 (bruine scoring procedure has no window)
drop add`x'
}

** Scoring of Subset Fallacy
** add 1 to score if a subset event is marked as higher probability than a superset event
gen subset_fallacy=0 if !missing(admc1)
replace subset_fallacy=subset_fallacy+1 if admc4>admc11 //comparing subset to superset event
replace subset_fallacy=subset_fallacy+1 if admc14>admc21 //comparing subset to superset event

** Re-scoring of Consistency in Risk Perception -- Bruine(2007). 
** This is percent correct out of all 14 probability pairs
** We also use the much narrower scoring for the framing_inconsistency component
** To clarify, admc_bruine is the alternative to using each of the above 3 sub-scores
gen admc_bruine  = 1 - (framing_inconsistency_bruine + time_conjunct + subset_fallacy)/14

** also create the subscores on a 0-1 scale, and a combined score on 0-1 scale
gen admc_time    = 1 - time_conjunct/8
gen admc_frame   = 1 - framing_inconsistency/4
gen admc_subset  = 1 - subset_fallacy/2
gen admc_totscore= 1 - (framing_inconsistency + time_conjunct + subset_fallacy)/14

** and standardize them for comparability
foreach x in bruine time frame subset totscore {
  egen     admc_S`x' = std(admc_`x')
}

** Show scores
sum admc_*

** Dummy out missing values
foreach x in bruine time frame subset totscore Sbruine Stime Sframe Ssubset Stotscore {
  gen     admc__`x'_m = missing(admc_`x')
  gen     admc__`x'   = admc_`x'
  replace admc__`x'   = 0  if admc__`x'_m
}

** These variables are not used anymore
drop admc1-admc22



** Now we create index of SS literacy questions (Social Security Attitutes)
** The questions are q12_ and q10*_ (several sub-questions of q10 exist)
** Q12 is multiple choice and q10* are all True/False
** The data already includes Q**_correct for each question as a 0/1 variable, which is what we use below 
** -----------------------------------------------------------------------------------------------------
egen miss=rowmiss(q12_correct q10*_correct) // mark people with missing data
egen ss_literacy=rowmean(q12_correct q10*_correct) if miss==0  //creat a prop. correct, only for everyone who has all measures
drop miss

** Show scores
d   q2c s7a q6a q12* q10* ss_literacy
sum q2c s7a q6a q12* q10* ss_literacy

tab q2c 
tab s7a 
tab q6a 
tab q12

** prefix these questions with ass (Attitudes on Social Security)
gen ssa_literacy  = ss_literacy
gen ssa_knowledge = q2c
gen ssa_confident = q6a

** code more knowledge as higher value (on 0-1 scale)
recode ssa_knowledge 5=. 4=0 3=0.33 2=0.67 1=1

** code more confident as higher value (on 0-1 scale), and "don't know" as missing
recode ssa_confident 5=. 4=0 3=0.33 2=0.67 1=1

** and standardize them for comparability
foreach x in literacy knowledge confident {
  egen     ssa_S`x' = std(ssa_`x')
}

** Dummy out missing values
foreach x in literacy knowledge confident Sliteracy Sknowledge Sconfident {
  gen     ssa__`x'_m = missing(ssa_`x')
  gen     ssa__`x'   = ssa_`x'
  replace ssa__`x'   = 0  if ssa__`x'_m
}

sum ssa_*

** no longer needed
drop q2c s7a q6a q12* q10* ss_literacy



** Create an index of ability and comfort with financial planning
** ---------------------------------------------------------------------------------
** ch009a "Have enough information"             --> agree (1) = more ability/comfort  --> REVERSE code
** ch009b "Not interested in learning about ret planning". Unclear what this means. It could be a very able person
**         who doesn't need more info, or someone with their head in the sand         --> OMIT
** ch009c "don't know best source of info"      --> agree (1) = less ability/comfort  --> Regular code
** ch009d "comfortable with online bank"        --> agree (1) = more ability/comfor   --> REVERSE code
** ch009e "comfortable with online search ret"  --> agree (1) = more ability/comfort  --> REVERSE code
** ch009f "comfortable with online search gov"  --> agree (1) = more ability/comfort  --> REVERSE code

** standarize the index
egen aplan_index =std(-ch009a+ch009c-ch009d-ch009e-ch009f)

sum aplan_index, d

** Dummy out missing values
foreach x in index {
  gen     aplan__`x'_m = missing(aplan_`x')
  gen     aplan__`x'   = aplan_`x'
  replace aplan__`x'   = 0  if aplan__`x'_m
}

sum aplan*

** no longer needed
drop ch009* 

** Create indicators for annuity holdings or IRA/KEOGH holdings
** ---------------------------------------------------------------------------------

** Create indicator variable for annuity holdings (self or spouse)
tab q273_, m
gen afin_annuity = q273_==1  if !missing(q273_)

** Create indicator variable for owning IRA or Keogh
tab q162_, m
gen afin_irakeogh = q162_==1 if !missing(q162_)

** Dummy out missing values
foreach x in annuity irakeogh {
  gen     afin__`x'_m = missing(afin_`x')
  gen     afin__`x'   = afin_`x'
  replace afin__`x'   = 0  if afin__`x'_m
}

sum afin*

* no longer needed
drop q273_ q162_ 






// ***************************************************************************************
// 	SET GLOBAL VARIABLES FOR REGRESSIONS AND OTHER ANALYSES
// ***************************************************************************************


** Demographics global for balance tests
global demographics_balance age agesq female married                                   ///
	                        nhwhite     nhblack nhother hispanic                       ///
	                        ed_dropout ed_hschool ed_somecol ed_college ed_graduat     ///
	                        hinc_lt25   hinc_25_50 hinc_50_75 hinc_75_100 hinc_ge100   ///
	                        hhsiz_1     hhsiz_2 hhsiz_3 hhsiz_4p anykids
							
	 
** Demographic global for regressions
global demographics         age agesq female married                                   ///
	                                    nhblack nhother hispanic                       ///
	                        ed_dropout ed_hschool            ed_college ed_graduat     ///
	                                    hinc_25_50 hinc_50_75 hinc_75_100 hinc_ge100   ///
	                                    hhsiz_2 hhsiz_3 hhsiz_4p anykids
										
** Experimental controls global for balance tests
global exp_controls_balance sell_first                                                        ///
	                        ls_startvalue_1 ls_startvalue_2 ls_startvalue_3 ls_first          ///
	                        ss_benefit_1    ss_benefit_2 ss_benefit_3 ss_benefit_4            ///
	                        vignette_name_1 vignette_name_2 vignette_name_3 vignette_name_4 

** Experimental controls for regressions	
global exp_controls         sell_first                                                        ///
	                                        ls_startvalue_2 ls_startvalue_3 ls_first          ///
	                                        ss_benefit_2 ss_benefit_3 ss_benefit_4            ///
	                                        vignette_name_2 vignette_name_3 vignette_name_4 

** Experimental controls for regressions, but with linear SS benefit variable (rather than 4 dummies)	
global exp_controls_linben  sell_first                                                        ///
                                            ls_startvalue_2 ls_startvalue_3 ls_first          ///
                                            ss_benefit100dollar                               ///
                                            vignette_name_2 vignette_name_3 vignette_name_4 
	







// ***************************************************************************************
// ***************************************************************************************
// 
// SAMPLE SELECTION
//
// ***************************************************************************************
// ***************************************************************************************


** Examine missing values in outcome data
** See overlap between the two types of missing information
gen miss_buy    = missing(logbuyprice)
gen miss_sell   = missing(logsellprice)
gen miss_spread = miss_buy | miss_sell
tab miss_buy miss_sell, m

** Check whether missings happen disproportionally for any treatment
tab treat2x3 if miss_spread

** Examine how often demographic data is missing
foreach var of varlist age gender education race hisplatino marital income_cat hhsize hhkids {
	qui gen miss_`var'= missing(`var')
	di "Number of missings for `var':"
	count if miss_`var'
	di ""
	qui drop miss_`var'
	}
gen     miss_anydemographic = missing(age, gender, education, race, hisplatino, marital, income_cat, hhsize, hhkids)
tab     miss_anydemographic

** Examine missing variables for missing cognition data
** As we saw before, most of these missings come from missing finlit questions
gen miss_cognix_pca =   missing(cognix_pca)
tab miss_cognix_pca


************* INDICATOR FOR BASELINE SAMPLE ***************
**
** Baseline sample requires that outcome variable, demographics, and cognition are all nonmissing
** 

gen byte basesample = ~(miss_spread | miss_anydemographic | miss_cognix_pca)			

************************************************************
	

** double check that in the demographics global there are no missing values
foreach var of varlist $demographics {
	qui assert `var' < . if basesample
	}


** Make sure that cognition measures are standardized on the basesample
** (rather than standardized on a broader sample)
sum cognix_pca if basesample
replace cognix_pca = (cognix_pca-r(mean))/r(sd)

sum cognix_avg if basesample
replace cognix_avg = (cognix_avg-r(mean))/r(sd)



** Indicator for not completing the entire survey
** Note: Some of these are still in our base sample because they did answer the key outcome 
**       variables and attrited on later questions
gen byte  attrit = missing_end_date
label var attrit "Did not complete the entire survey (including the insurance questions that we don't use)"

** For reporting in the paper
** From the survey administrators: number of invited participants: 5,521

** Percent opening the survey
count if ~missing_start_date
scalar N_start=r(N)
di "Percent of all invited panel members that opened the survey: " 100*N_start/5521
 

** Percent completing the buy and sell questions conditional on opening the survey
count if ~miss_spread
scalar N_complete=r(N)
di "Percent of respondents to the invitation that completed the annuity questions: " 100*N_complete/N_start

** Overall completion rate (for our section)
di "Percent of all invited panel members that completed the annuity questions: " 100*N_complete/5521

** Percent of respondents we drop due to missing demographics or cognition questions
count if miss_anydemographic & ~miss_spread
di "Percent dropped for missing demographics: " 100*r(N)/N_complete

count if miss_cognix_pca & ~miss_anydemographic & ~miss_spread
di "Additional percent dropped for missing cognition measure: " 100*r(N)/N_complete

** don't have variables take up more space than needed
quietly compress








	
	
	
// ***************************************************************************************
// ***************************************************************************************
// 
//  PART 2: ANALYSES
//
// ***************************************************************************************
// ***************************************************************************************

	
	
	
	
	
	
	
// ***************************************************************************************
// 			Figure 1: CDF of Sell Price and Buy Price in the Subsample without Anchoring
//				(When buy price and sell price are each shown first)
// ***************************************************************************************
   
** Export data for first line of figure 1: buy price
outsheet midbuy  if ~sell_first & basesample using rawdata_fig1a_midbuy.xls, replace

** Export data for second line of figure 1: sell price
outsheet midsell if  sell_first & basesample using rawdata_fig1b_midsell.xls, replace

** summarize to get the medians and median regressions to get standard errors
sum  midbuy  if ~sell_first & basesample, d
qreg midbuy  if ~sell_first & basesample

sum  midsell if  sell_first & basesample, d
qreg midsell if  sell_first & basesample
   
   

// ***************************************************************************************
// 			Figure 2: CDF of Log Sell Price Minus Log Buy Price
// ***************************************************************************************
   
** Export data for figure 2: log difference of log(sell) - log(buy)
outsheet logsellbuydiff if basesample using rawdata_fig2_logdiff.xls, replace

** Summary statistics for the mean and median lines in the figure  
sum logsellbuydiff if basesample, d   
   
         
   
   
// ***************************************************************************************
// 			APX Figure A1: CDF of Sell Price and Buy Price in the Entire Baseline Sample
// ***************************************************************************************
      
** Export data for first line of figure A1: buy price
outsheet midbuy  if  basesample using rawdata_figA1a_midbuy.xls, replace

** Export data for second line of figure A1: sell price
outsheet midsell if  basesample using rawdata_figA1b_midsell.xls, replace

** summarize to get the medians and median regressions to get standard errors
sum  midbuy  if basesample, d
qreg midbuy  if basesample
sum  midsell if basesample, d
qreg midsell if basesample

   
   
   
   
// ***************************************************************************************
// 			APX Figure A2: CDF of Sell-Buy Spread
// ***************************************************************************************
   
** Figure A2: CDF of spread
** Export data for figure A2: spread = abs log difference of log(sell) - log(buy)
outsheet logspread if basesample using rawdata_figA2_spread.xls, replace

 
** Summary statistics for the mean and median lines in the figure  
sum logspread if basesample, d


** Wilcoxon matched-pairs signed-rank test with exact statistics
signrankex midbuy=midsell if basesample












	
// ***************************************************************************************
//			TABLE 1: Contains Description of Experimental Design, Not Part of STATA File
// ***************************************************************************************






	
// ***************************************************************************************
//			TABLE 2: Descriptive Statistics on the Sell Price, Buy Price, and Spread
// ***************************************************************************************


** Show the summary statistics of the log of the outcome variables, now also including the spread
** in a table.
**
** In the tables:
**    Four rows:
**       1. "Sell price (log)"
**       2. "Buy price (log)"
**       3. "Sell-Buy Spread"
**       4. "N"  (no. of observations)
**
**    Four main columns (but use separate columns in xls for mean and standard deviation, so 7 columns in excel)
**       1. Sell question asked first
**          Show the mean and sd in two xls columns for main column 1
**       2. Buy question asked first
**          Show the mean and sd in two xls columns for main column 2
**       3. P-value on the difference in means between col 1 and col 2
**          This p-value is reported under "P>|t|" on sell_first in the regressions below.
**       4. Entire baseline sample: show the mean and sd
**
   
** The means and sd for column 1:
sum logsellprice logbuyprice logspread if  sell_first & basesample
   
** The means and sd for column 2:
sum logsellprice logbuyprice logspread if ~sell_first & basesample

** The means and sd for column 4:
sum logsellprice logbuyprice logspread if basesample

** regressions for the p-values in column 3 for rows 1-3
** The p-value is reported under "P>|t|" on sell_first in the regressions below.

* P-value for Row 1 
reg logsellprice sell_first if basesample, robust

* P-Value for Row 2
reg logbuyprice  sell_first if basesample, robust

* P-Value for Row 3
reg logspread    sell_first if basesample, robust

  
  
  
  
   

	
// ***************************************************************************************
// 			TABLE 3: MAIN REGRESSION; Treatment Effects on the Sell-Buy Spread and its Components
// ***************************************************************************************

** Table 3
**
** This table has 3 main columns, with each main column having two entries, namely
** for the coefficient estimate and the standard error.
**    Col 1: "Sell-Buy Spread"
**    Col 2: "Sell Price (log)"
**    Col 3: "Buy Price (log)"
**
** In the rows of the table are the explanatory variables:
**    Row 1: "Complexity treatment"
**    Row 2: "Consequence message treatment"
**    Row 3: "Cognition index"
**    Row 4: "Sell question first"
**
**    Row 5: "P-value on lump-sum starting values" 
**    Row 6: "P-value on lump-sum shown first" 
**    Row 7: "P-value on SS benefit amounts"
**    Row 8: "P-value on vignette names"
**    
**    Row 9: "Demographic controls"   (enter "Yes" in each column)
**   
**    Row 10: R^2
**    Row 11: N
**


** ------ Column 1 --------
** Estimates for rows 1-4, row 6: the p-value on lump-sum shown first (under "P>|t|"), R2 and N are in the output below	
reg logspread    any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust

* Row 5: P-Value on Lump-Sum Starting Values
testparm ls_startvalue_*

* Row 7: P-Value on SS Benefit Amounts
testparm ss_benefit_*

* Row 8: P-Value on Vignette Names
testparm vignette_name_*



** ------ Column 2 --------
** Estimates for rows 1-4, row 6: the p-value on lump-sum shown first (under "P>|t|"), R2 and N are in the output below	
reg logsellprice any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust

** Row 5: P-Value on LS Starting Values
testparm ls_startvalue_*

** Row 7: P-Value on SS Benefit Amounts
testparm ss_benefit_*

** Row 8: P-Value on Vignette Names
testparm vignette_name_*



* ------ Column 3 --------
* Estimates for rows 1-4, row 6: the p-value on lump-sum shown first (under "P>|t|"), R2 and N are in the output below	
reg logbuyprice  any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust

* Row 5: P-Value on LS Startinv Values
testparm ls_startvalue_*

* Row 7: P-Value on SS Benefit Amounts
testparm ss_benefit_*

* Row 8: P-Value on Vignette Names
testparm vignette_name_*


	
	
	
	
	


// ***************************************************************************************
// 			TABLE 4: HETEROGENEITY IN TREATMENT EFFECTS
// ***************************************************************************************
    
** ------- Specification 1: "By Consequence Message" ---------
**
** Two Rows:   1. "No consequence message"
**             2. "Consequence message"
**
** Note: here the column with coefficients on consequence message stays empty
** In the row "No consequence message", the coefficient on anycompXcons0 appears in the column for "Complexity Treatment"
** In the row "Consequence message", the coefficient on anycompXcons1 appears in the column for "Complexity Treatment"

** Create interaction variables of complexity with consequence dummies
gen anycompXcons0= any_complexity*(consequence==0)
gen anycompXcons1= any_complexity*(consequence==1)

** Output for specification 1
reg logspread   anycompXcons*  consequence cognix_pca $exp_controls $demographics if basesample, robust
 
** P-value in the Complexity treatment column on test of equal coefficients:
testparm anycompXcons*, equal

** P-value in the Consequence message treatment column: 
** Stays empty (N/A)


** --------- Specification 2: "By Complexity Treatment" ----------
**
** Two Rows:   1. "No complexity treatment"
**             2. "Complexity treatment"
**
** Note: here the column with coefficients on complexity treatment stays empty
** Note: This is technically the exact same regression as in panel A, just a different linear combination of the 
**       regressors (e.g., note that all other coefficients, R2, RMSE, F etc. are all the same)

** Create interaction variables of consequence message with complexity dummies
gen consXanycomp0= consequence*(any_complexity==0)
gen consXanycomp1= consequence*(any_complexity==1)

** Output for specification 2
reg logspread   any_complexity consXanycomp*  cognix_pca $exp_controls $demographics if basesample, robust
 
** P-value in the Complexity treatment column:
** Stays empty (N/A)


** P-value in the Consequence message treatment column on test of equal coefficients: 
testparm consXanycomp*, equal


** ---------- Specification 3: "By Cognition" ------------------
**
** Two Rows:   1. "Below median cognition index"
**             2. "Above median cognition index"
**

** find the median level of cognition within the basesample
sum cognix_pca if basesample, d
assert cognix_pca<. if basesample

** Create interaction variables
gen cogn_abovemed = cognix_pca > r(p50)

gen anycompXcogn0= any_complexity*(cognix_pca <= r(p50))
gen anycompXcogn1= any_complexity*(cognix_pca >  r(p50))

gen consXcogn0= consequence*(cognix_pca <= r(p50))
gen consXcogn1= consequence*(cognix_pca >  r(p50))


** Output for specification 3
reg logspread   anycompXcogn* consXcogn* cogn_abovemed cognix_pca $exp_controls $demographics if basesample, robust
 
** P-value in the Complexity treatment column on test of equal coefficients:
testparm anycompXcogn*, equal
 
** P-value in the Consequence message treatment column on test of equal coefficients: 
testparm consXcogn*, equal


** ----------- Specification 4: "By level of Social Security Benefits" -------------
**
** Two Rows:   1. "Below median ($800 or $1200 per month)"
**             2. "Above median ($1600 or $2000 per month)"
**

** Run as a split by the median level

** Check to see what the median level of SS Benefits is
tab ss_benefit if basesample, m
assert ss_benefit<. if basesample

** Create interaction variables
gen ssb_abovemed = ss_benefit > 2

gen anycompXssb0= any_complexity*(1-ssb_abovemed)
gen anycompXssb1= any_complexity*(ssb_abovemed)

gen consXssb0= consequence*(1-ssb_abovemed)
gen consXssb1= consequence*(ssb_abovemed)

** Output for specification 4
reg logspread   anycompXssb* consXssb* ssb_abovemed cognix_pca $exp_controls $demographics if basesample, robust
 
* P-value in the Complexity treatment column on test of equal coefficients:
testparm anycompXssb*, equal
 
* P-value in the Consequence message treatment column on test of equal coefficients: 
testparm consXssb*, equal

* For the nobs for the rows (in square brackets in the last column, check that they add up to the N of the regression)
tab ssb_abovemed if basesample	
	
	
	
	
	
	

	
// ***************************************************************************************
//			APX TABLE A01: SUMMARY STATISTICS AND COMPARISON TO THE CPS
// ***************************************************************************************
		
** Column 1: Summary statistics on the UAS DATA in our baseline sample
tabstat agecat_* $demographics_balance if basesample, statistics(mean sd N min max) columns(statistics)



	
** Column 2: Summary Statistics from the 2016 CPS
** 
** CPS abstract obtained from the IPUMScps (http://cps.ipums.org/cps/)
**
** Year: 2016 Annual Social and Economic Supplement (taken in March; 2017 ASEC available, but 2016 used to match timing of UAS sample)
** Age selection: 18+

preserve
clear

quietly infix              ///
  int     year      1-4    ///
  long    serial    5-9    ///
  byte    numprec   10-11  ///
  double  hwtsupp   12-21  ///
  byte    gq        22-22  ///
  byte    asecflag  23-23  ///
  double  hhincome  24-31  ///
  byte    month     32-33  ///
  byte    pernum    34-35  ///
  double  wtsupp    36-45  ///
  byte    nchild    46-46  ///
  byte    age       47-48  ///
  byte    sex       49-49  ///
  int     race      50-52  ///
  byte    marst     53-53  ///
  int     hispan    54-56  ///
  int     educ      57-59  ///
  using `"cps_ASEC2016.dat"'

replace hwtsupp  = hwtsupp  / 10000
replace wtsupp   = wtsupp   / 10000

format hwtsupp  %10.4f
format hhincome %8.0f
format wtsupp   %10.4f

label var year     `"Survey year"'
label var serial   `"Household serial number"'
label var numprec  `"Number of person records following"'
label var hwtsupp  `"Household weight, Supplement"'
label var gq       `"Group Quarters status"'
label var asecflag `"Flag for ASEC"'
label var hhincome `"Total household income"'
label var month    `"Month"'
label var pernum   `"Person number in sample unit"'
label var wtsupp   `"Supplement Weight"'
label var nchild   `"Number of own children in household"'
label var age      `"Age"'
label var sex      `"Sex"'
label var race     `"Race"'
label var marst    `"Marital status"'
label var hispan   `"Hispanic origin"'
label var educ     `"Educational attainment recode"'

label define gq_lbl 0 `"NIU (Vacant units)"'
label define gq_lbl 1 `"Households"', add
label define gq_lbl 2 `"Group Quarters"', add
label values gq gq_lbl

label define asecflag_lbl 1 `"ASEC"'
label define asecflag_lbl 2 `"March Basic"', add
label values asecflag asecflag_lbl

label define sex_lbl 1 `"Male"'
label define sex_lbl 2 `"Female"', add
label define sex_lbl 9 `"NIU"', add
label values sex sex_lbl

label define race_lbl 100 `"White"'
label define race_lbl 200 `"Black/Negro"', add
label define race_lbl 300 `"American Indian/Aleut/Eskimo"', add
label define race_lbl 650 `"Asian or Pacific Islander"', add
label define race_lbl 651 `"Asian only"', add
label define race_lbl 652 `"Hawaiian/Pacific Islander only"', add
label define race_lbl 700 `"Other (single) race, n.e.c."', add
label define race_lbl 801 `"White-Black"', add
label define race_lbl 802 `"White-American Indian"', add
label define race_lbl 803 `"White-Asian"', add
label define race_lbl 804 `"White-Hawaiian/Pacific Islander"', add
label define race_lbl 805 `"Black-American Indian"', add
label define race_lbl 806 `"Black-Asian"', add
label define race_lbl 807 `"Black-Hawaiian/Pacific Islander"', add
label define race_lbl 808 `"American Indian-Asian"', add
label define race_lbl 809 `"Asian-Hawaiian/Pacific Islander"', add
label define race_lbl 810 `"White-Black-American Indian"', add
label define race_lbl 811 `"White-Black-Asian"', add
label define race_lbl 812 `"White-American Indian-Asian"', add
label define race_lbl 813 `"White-Asian-Hawaiian/Pacific Islander"', add
label define race_lbl 814 `"White-Black-American Indian-Asian"', add
label define race_lbl 815 `"American Indian-Hawaiian/Pacific Islander"', add
label define race_lbl 816 `"White-Black--Hawaiian/Pacific Islander"', add
label define race_lbl 817 `"White-American Indian-Hawaiian/Pacific Islander"', add
label define race_lbl 818 `"Black-American Indian-Asian"', add
label define race_lbl 819 `"White-American Indian-Asian-Hawaiian/Pacific Islander"', add
label define race_lbl 820 `"Two or three races, unspecified"', add
label define race_lbl 830 `"Four or five races, unspecified"', add
label define race_lbl 999 `"Blank"', add
label values race race_lbl

label define marst_lbl 1 `"Married, spouse present"'
label define marst_lbl 2 `"Married, spouse absent"', add
label define marst_lbl 3 `"Separated"', add
label define marst_lbl 4 `"Divorced"', add
label define marst_lbl 5 `"Widowed"', add
label define marst_lbl 6 `"Never married/single"', add
label define marst_lbl 7 `"Widowed or Divorced"', add
label define marst_lbl 9 `"NIU"', add
label values marst marst_lbl

label define hispan_lbl 000 `"Not Hispanic"'
label define hispan_lbl 100 `"Mexican"', add
label define hispan_lbl 102 `"Mexican American"', add
label define hispan_lbl 103 `"Mexicano/Mexicana"', add
label define hispan_lbl 104 `"Chicano/Chicana"', add
label define hispan_lbl 108 `"Mexican (Mexicano)"', add
label define hispan_lbl 109 `"Mexicano/Chicano"', add
label define hispan_lbl 200 `"Puerto Rican"', add
label define hispan_lbl 300 `"Cuban"', add
label define hispan_lbl 400 `"Dominican"', add
label define hispan_lbl 500 `"Salvadoran"', add
label define hispan_lbl 401 `"Other Hispanic"', add
label define hispan_lbl 410 `"Central/South American"', add
label define hispan_lbl 411 `"Central American, (excluding Salvadoran)"', add
label define hispan_lbl 412 `"South American"', add
label define hispan_lbl 901 `"Do not know"', add
label define hispan_lbl 902 `"N/A (and no response 1985-87)"', add
label values hispan hispan_lbl

label define educ_lbl 000 `"NIU or no schooling"'
label define educ_lbl 001 `"NIU or blank"', add
label define educ_lbl 002 `"None or preschool"', add
label define educ_lbl 010 `"Grades 1, 2, 3, or 4"', add
label define educ_lbl 011 `"Grade 1"', add
label define educ_lbl 012 `"Grade 2"', add
label define educ_lbl 013 `"Grade 3"', add
label define educ_lbl 014 `"Grade 4"', add
label define educ_lbl 020 `"Grades 5 or 6"', add
label define educ_lbl 021 `"Grade 5"', add
label define educ_lbl 022 `"Grade 6"', add
label define educ_lbl 030 `"Grades 7 or 8"', add
label define educ_lbl 031 `"Grade 7"', add
label define educ_lbl 032 `"Grade 8"', add
label define educ_lbl 040 `"Grade 9"', add
label define educ_lbl 050 `"Grade 10"', add
label define educ_lbl 060 `"Grade 11"', add
label define educ_lbl 070 `"Grade 12"', add
label define educ_lbl 071 `"12th grade, no diploma"', add
label define educ_lbl 072 `"12th grade, diploma unclear"', add
label define educ_lbl 073 `"High school diploma or equivalent"', add
label define educ_lbl 080 `"1 year of college"', add
label define educ_lbl 081 `"Some college but no degree"', add
label define educ_lbl 090 `"2 years of college"', add
label define educ_lbl 091 `"Associate's degree, occupational/vocational program"', add
label define educ_lbl 092 `"Associate's degree, academic program"', add
label define educ_lbl 100 `"3 years of college"', add
label define educ_lbl 110 `"4 years of college"', add
label define educ_lbl 111 `"Bachelor's degree"', add
label define educ_lbl 120 `"5+ years of college"', add
label define educ_lbl 121 `"5 years of college"', add
label define educ_lbl 122 `"6+ years of college"', add
label define educ_lbl 123 `"Master's degree"', add
label define educ_lbl 124 `"Professional school degree"', add
label define educ_lbl 125 `"Doctorate degree"', add
label define educ_lbl 999 `"Missing/Unknown"', add
label values educ educ_lbl

** Check we only have ASEC observations
assert asecflag==1

** Drop institutionalized population
drop if gq==2
drop gq 

** gender
tab sex, m
gen female=sex==2 if sex<.

** age categories
recode age (18/34=1 "18-34") (35/49=2 "35-49") (50/64=3 "50-64") (65/85=4 "65+"), gen(agecat)
tab agecat
assert agecat==1|agecat==2|agecat==3|agecat==4
gen agecat_18_34=agecat==1
gen agecat_35_49=agecat==2
gen agecat_50_64=agecat==3
gen agecat_65_plus=agecat==4

** race/ethnicity
tab race, m
tab race, sum(race)

tab hispan, m
tab hispan, sum(hispan)

gen race4=race
recode race4 100=1 200=2 300/830=4
replace race4=3 if hispan>0 & hispan<.

tab race4
label def race4 1 "Non-H Wh" 2 "Non-H Bl" 3 "Hispanic" 4 "Other" 
label val race4 race4

gen byte white=race4==1
gen byte black=race4==2
gen byte hisp =race4==3
gen byte other=race4==4
label var white "Non-Hispanic White"
label var black "Non-Hispanic Black"
label var hisp  "Hispanic"
label var other "Panelist is not Non-Hispanic white or black"
assert (white+black+hisp+other)==1

** education
tab educ, m
tab educ, sum(educ)

gen educ5=educ
recode educ5 2/71=1 73=2 80/110=3 111=4 120/125=5
label def educ5 1 "HS dropout" 2 "HS" 3 "Some College"  4 "Bachelor's degree" 5 "Graduate degree"
label val educ5 educ5

tab educ5

gen byte edudo  = educ5==1
gen byte eduhs  = educ5==2
gen byte edusc  = educ5==3
gen byte edubd  = educ5==4
gen byte edugd	= educ5==5
assert (edudo+eduhs+edusc+edubd+edugd)==1

label var edudo "High School Dropout"
label var eduhs "High School Education"
label var edusc "Some college"
label var edubd "Bachelor's degree"
label var edugd "Graduate degree"

** marital status
tab marst, m
tab marst, sum(marst)

gen byte xmarried  = marst==1|marst==2
gen byte xsingle   = marst==3|marst==4|marst==5|marst==6
assert (xmarried+xsingle)==1

label var xmarried "Married"
label var xsingle "Single"
     
          
** household size
tab numprec, m 
gen pphhsize=numprec

gen hhsize_1=pphhsize==1
gen hhsize_2=pphhsize==2
gen hhsize_3=pphhsize==3
gen hhsize_4p=pphhsize>=4
assert (hhsize_1+hhsize_2+hhsize_3+hhsize_4p)==1

label var hhsize_1 "Household size of one"
label var hhsize_2 "Household size of two"
label var hhsize_3 "Household size of three"
label var hhsize_4 "Household size of four or more"

          
** household income
gen hhinc5=hhincome
recode hhinc5 min/24999=1 25000/49999=2 50000/74999=3 75000/99999=4 100000/max=5
tab hhinc5

gen inc00_25  = hhinc5==1
gen inc25_50  = hhinc5==2
gen inc50_75  = hhinc5==3
gen inc75_100 = hhinc5==4
gen inc100p   = hhinc5==5
assert (inc00_25+inc25_50+inc50_75+inc75_100+inc100p)==1

label var inc00_25 "Household income: Below 25k"
label var inc25_50 "Household income: 25k-50k"
label var inc50_75 "Household income: 50k-75k"
label var inc75_100 "Household income: 75k-100k"
label var inc100p "Household income: Above 100k"

** children present in household
gen nchild2=nchild
recode nchild2 1/9=1
gen nokids = nchild2==0
gen anykids = nchild2==1
assert (nokids+anykids)==1

sum agecat_*  female  xmarried  white  black  other  hisp  edudo  eduhs  edusc  edubd edugd inc00_25-inc100p hhsize_* anykids [aw=wtsupp], sep(0)

restore






	
// ***************************************************************************************
//			APX TABLE A02: TEXT OF THE VIGNETTES AND THE CONSEQUENCE MESSAGE
//						Not Part of STATA file
// ***************************************************************************************








// ***************************************************************************************
//			APX TABLE A03: BALANCE TESTS
// ***************************************************************************************

 
** Given that we look at complexity and consequence uninteracted, we focus on
** the two pairwise balance tests 

** Columns 1-3 of Randomization check table
** ----------------------------------------
** Col. 1: means for variables when any_complexity==0
** Col. 2: means for variables when any_complexity==1
** Col. 3: p-value on test of equal mean

** Col 1 & 2: baseline sample demographics and cognition
tabstat $demographics_balance cognix_pca if basesample, by(any_complexity) statistics(mean semean N) columns(statistics)

** append to col 1 & 2: indicators of missing data or attrition
tabstat attrit basesample miss_spread miss_anydemographic miss_cognix_pca, by(any_complexity) statistics(mean semean N) columns(statistics)
  
* Col 3: test equality of means in each row
foreach var of varlist $demographics_balance cognix_pca {
   qui reg `var' any_complexity if basesample, robust
   di "P-value on test of equal means = "  Ftail(e(df_m),e(df_r),e(F))   "               for `var' "
}

* Col 3, continued: test equality of means in each row of indicators of missing data or attrition 
foreach var of varlist attrit basesample miss_spread miss_anydemographic miss_cognix_pca {
   qui reg `var' any_complexity, robust
   di "P-value on test of equal means = "  Ftail(e(df_m),e(df_r),e(F))   "               for `var' "
}

* Joint test (only variable defined in the baseline sample
* Report in the last row of the table
logit any_complexity $demographics cognix_pca if basesample, vce(robust)   



** Columns 4-6 of Randomization check table
** ----------------------------------------
** Col. 4: means for variables when consequence message ==0
** Col. 5: means for variables when consequence message ==1
** Col. 6: p-value on test of equal mean


** Col 4 & 5: baseline sample demographics and cognition
tabstat $demographics_balance cognix_pca if basesample, by(consequence) statistics(mean semean N) columns(statistics)

** append to col 4 & 5: indicators of missing data or attrition
tabstat attrit basesample miss_spread miss_anydemographic miss_cognix_pca, by(consequence) statistics(mean semean N) columns(statistics) 

** Col 6: test equality of means in each row
foreach var of varlist $demographics_balance cognix_pca {
   qui reg `var' consequence if basesample, robust
   di "P-value on test of equal means = "  Ftail(e(df_m),e(df_r),e(F))   "               for `var' "
}

** Col 6, continued: test equality of means in each row of indicators of missing data or attrition 
foreach var of varlist attrit basesample miss_spread miss_anydemographic miss_cognix_pca {
   qui reg `var' consequence, robust
   di "P-value on test of equal means = "  Ftail(e(df_m),e(df_r),e(F))   "               for `var' "
}

** Joint test (only variable defined in the baseline sample
** Report in the last row of the table
logit consequence $demographics cognix_pca if basesample, vce(robust)   





		
// ***************************************************************************************
//			APX TABLE A04: Predictors of Heterogeneity in the Sell-Buy Spread in Control Sample
// ***************************************************************************************


** -------- Description of heterogeneity in spread in control sample ------- 
** Show to what extent demographics and the cognition index can explain the spread 
** Exclude the other experimental controls so that R2 shows variation explained by
** demographics and/or cognition index

** Column 1: Just demographics. Limit to sample to exclude complexity tretments or consequence message treatment
reg logspread                 $demographics if basesample & !any_complexity & !consequence, robust

** Rather than reporting each of these dummies, report p-values on their joint tests of being zero
testparm nhblack nhother hispanic
testparm ed_*
testparm hinc_*
testparm hhsiz_*
 
** Column 2: Just cognition index. Limit sample to exclude complexity tretments or consequence message treatment
reg logspread     cognix_pca                if basesample & !any_complexity & !consequence, robust

** Column 3: Both. Limit sample to exclude complexity tretments or consequence message treatment
reg logspread     cognix_pca  $demographics if basesample & !any_complexity & !consequence, robust

** Rather than reporting each of these dummies, report p-values on their joint tests of being zero
testparm nhblack nhother hispanic
testparm ed_*
testparm hinc_*
testparm hhsiz_*
		
		
		
		
		
		
	
// ***************************************************************************************
// 			APX TABLE A05: MAIN REGRESSIONS, all coefficients reported
// ***************************************************************************************
	    
** APX Table A05 shows the full output of the three regressions of Table 3.
**
** It has the same columns as table , but lots more rows: one row for each explanatory variable
** as well as rows for R2 and N.

** See the output for Table 3 (reported earlier in the log file)
    
	

	
	
	
	
	
	
// ***************************************************************************************
// 			APX TABLE A06: MAIN REGRESSION but
//		Complexity Treatment Split out by Type of Complexity Treatment
// ***************************************************************************************
    
** This table has the same layout as Table 3 with 2 exceptions:
**
** 1. Instead of a single complexity treatment, there are two rows with different complexity
**    treatments. These are the first two rows of the table:
**    Row 1: "Complexity treatment: Wide Spread"  (complexity_2)
**    Row 2: "Complexity treatment: Added Info"   (complexity_3)
**
** 2. Just above the rows with R2 and N, insert the follow row:
**    "P-value that coefficients on both complexity treatments are equal"


** ------ Column 1 --------
** Estimates for rows 1-5, row 7: the p-value on lump-sum shown first (under "P>|t|"), R2 and N are in the output below	
reg logspread complexity_2 complexity_3 consequence cognix_pca $exp_controls $demographics if basesample, robust
	
** Row 6: p-value on starting values
testparm ls_startvalue_*

** Row 8: p-value on benefit amounts
testparm ss_benefit_*

** Row 9: p-value on benefit amounts
testparm vignette_name_*

** Row 11: p-value on complexity treatments being equal
testparm complexity_*, equal



** ------ Column 2 --------
** Estimates for rows 1-5, row 7: the p-value on lump-sum shown first (under "P>|t|"), R2 and N are in the output below	
reg logsellprice complexity_2 complexity_3 consequence cognix_pca $exp_controls $demographics if basesample, robust


** Row 6: p-value on starting values
testparm ls_startvalue_*

** Row 8: p-value on benefit amounts
testparm ss_benefit_*

** Row 9: p-value on benefit amounts
testparm vignette_name_*

** Row 11: p-value on complexity treatments being equal
testparm complexity_*, equal



** ------ Column 3 --------
** Estimates for rows 1-5, row 7: the p-value on lump-sum shown first (under "P>|t|"), R2 and N are in the output below	
reg logbuyprice complexity_2 complexity_3 consequence cognix_pca $exp_controls $demographics if basesample, robust

** Row 6: p-value on starting values
testparm ls_startvalue_*

** Row 8: p-value on benefit amounts
testparm ss_benefit_*

** Row 9: p-value on benefit amounts
testparm vignette_name_*

** Row 11: p-value on complexity treatments being equal
testparm complexity_*, equal





// ***************************************************************************************
//			APX TABLE A07: FURTHER HETEROGENEITY IN TREATMENT EFFECTS
// ***************************************************************************************


** ----------- Specification 1: "By Gender" --------------
**
** Two Rows:   1. "Female"
**             2. "Male"
**

assert female<. if basesample

** Create interaction variables
gen anycompXfem1 = any_complexity*(female==1)
gen anycompXfem0 = any_complexity*(female==0)

gen consXfem1    = consequence*(female==1)
gen consXfem0    = consequence*(female==0)

** Output for specification 1
reg logspread   anycompXfem* consXfem* cognix_pca $exp_controls $demographics if basesample, robust
 
** P-value in the Complexity treatment column:
testparm anycompXfem*, equal
 
** P-value in the Consequence message treatment column: 
testparm consXfem*, equal

** For the nobs for the rows (in square brackets in the last column, check that they add up to the N of the regression)
tab female if basesample




** --------- Specifcation 2: "By Education" ------------------ 
**
** Two Rows:   1. "Some college or less"
**             2. "Bachelor's degree or more"
**

** Double check where the median is
tab edu_ix if basesample
assert edu_ix<. if basesample

** Create interaction variables
gen anycompXedu0 = any_complexity*(edu_ix<=3)
gen anycompXedu1 = any_complexity*(edu_ix>=4)

gen consXedu0    = consequence*(edu_ix<=3)
gen consXedu1    = consequence*(edu_ix>=4)

** Output for specification 2
reg logspread   anycompXedu* consXedu* cognix_pca $exp_controls $demographics if basesample, robust
 
** P-value in the Complexity treatment column:
testparm anycompXedu*, equal
 
** P-value in the Consequence message treatment column: 
testparm consXedu*, equal

** For the nobs for the rows (in square brackets in the last column, check that they add up to the N of the regression)
count if  basesample & (edu_ix<=3)
count if  basesample & (edu_ix>=4)



** ----------- Specification 3: "By Age" ----------------
**
** Two Rows:   1. "Below median (less than 50)"
**             2. "Above median (50 or more)"
**

** Find the median age 
sum age if basesample, d
assert age<. if basesample
gen age_abovemed = age > r(p50)

** Create interaction variables
gen anycompXage0= any_complexity*(age <= r(p50))
gen anycompXage1= any_complexity*(age >  r(p50))

gen consXage0= consequence*(age <= r(p50))
gen consXage1= consequence*(age >  r(p50))

** Output for specification 3
reg logspread   anycompXage* consXage* age_abovemed cognix_pca $exp_controls $demographics if basesample, robust
 
** P-value in the Complexity treatment column:
testparm anycompXage*, equal
 
** P-value in the Consequence message treatment column: 
testparm consXage*, equal

** For the nobs for the rows (in square brackets in the last column, check that they add up to the N of the regression)
tab age_abovemed if basesample


** ------------- Specification 4: "By Income" --------------
**
** Two Rows:   1. "Below median (less than $75k)"
**             2. "Above median ($75k or more)"
**

** Double check where the median is
sum income_cat if basesample, d
assert income_cat<. if basesample
gen inc_abovemed = income_cat > 3

** Create interaction variables
gen anycompXinc0= any_complexity*(1-inc_abovemed)
gen anycompXinc1= any_complexity*(inc_abovemed)

gen consXinc0= consequence*(1-inc_abovemed)
gen consXinc1= consequence*(inc_abovemed)

** Output for specification 4
reg logspread   anycompXinc* consXinc* inc_abovemed cognix_pca $exp_controls $demographics if basesample, robust
 
** P-value in the Complexity treatment column:
testparm anycompXinc*, equal
 
** P-value in the Consequence message treatment column: 
testparm consXinc*, equal

** For the nobs for the rows (in square brackets in the last column, check that they add up to the N of the regression)
tab inc_abovemed if basesample





  
// ***************************************************************************************
//			APX TABLE A08: Robustness of the Main Treatment Effects
// ***************************************************************************************
	
	
** Row 1: "Baseline" (the same regression as column 1 of Table 3)
reg logspread    any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust



** Panel A: Changing Cognition Measures
** ------------------------------------

** Row 2: "Cognition score is simple average"
reg logspread    any_complexity consequence cognix_avg $exp_controls $demographics if basesample, robust

** Row 3: "All five components of cognition score entered separately"
reg logspread    any_complexity consequence cog_fin cog_n1 cog_n2 cog_v1 cog_v2 $exp_controls $demographics if basesample, robust

** Row 4: "Financial literacy is only cognition measure"
reg logspread    any_complexity consequence cog_fin                             $exp_controls $demographics if basesample, robust

** Row 5: "Numeracy measures are only cognition measures"
reg logspread    any_complexity consequence         cog_n1 cog_n2               $exp_controls $demographics if basesample, robust

** Row 6: "Verbal measures are only cognition measures"
reg logspread    any_complexity consequence                       cog_v1 cog_v2 $exp_controls $demographics if basesample, robust

** Row 7: Additional controls for cognition, knowledge, and financial experience 
reg logspread    any_complexity consequence ///
                 admc__Sframe* admc__Stime* admc__Ssubset* ///
                 ssa__Sknowledge* ssa__Sliteracy* ssa__Sconfident* ///
                 afin__annuity* afin__irakeogh* ///
                 aplan__index* ///
                 cognix_pca $exp_controls $demographics if basesample, robust




** Panel B: Sample Selection
** ------------------------------------

** Reminder: gen byte basesample = ~(miss_spread | miss_anydemographic | miss_cognix_pca)			

** Dummy out the demographics
foreach var of varlist $demographics {
    gen DQ_`var'=`var'
    gen byte DM_`var' = missing(`var')
    replace DQ_`var'=0 if DM_`var'
}

** Dummy out cognition index
gen CQ_cognix_pca = cognix_pca
gen byte CM_cognix_pca =missing(cognix_pca)
replace CQ_cognix_pca=0 if CM_cognix_pca

** Row 8: "Include observations with missing demographics (dummied out)"
reg logspread    any_complexity consequence cognix_pca $exp_controls DQ_* DM_* if ~miss_cognix_pca, robust

** Row 9: "Include observations with missing cognition index (dummied out)"
reg logspread    any_complexity consequence CQ_* CM_* $exp_controls $demographics if ~miss_anydemographic, robust

** Row 10: "Include observations with any missing values (dummied out)"
reg logspread    any_complexity consequence CQ_* CM_* $exp_controls DQ_* DM_*, robust

** double check that N matches the count below
count if ~miss_spread
assert e(sample)==~miss_spread 

** Row 11: "Exclude Native American and LA County oversamples"
reg logspread    any_complexity consequence cognix_pca $exp_controls $demographics if basesample & nationalsample, robust




** Panel C: Different controls
** ------------------------------------

** Row 12: "No Cognition Controls"
reg logspread    any_complexity consequence            $exp_controls $demographics if basesample, robust

** Row 13: "No Demographic Controls"
reg logspread    any_complexity consequence cognix_pca $exp_controls               if basesample, robust

** Row 14: "No Secondary Experimental Controls"
reg logspread    any_complexity consequence cognix_pca               $demographics if basesample, robust




** Panel D: Adjustments to the outcome variable
** ---------------------------------------------------

** Row 15: "Buy and sell valuations topcoded at $100,000"
reg logspread_100k any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust


** Row 16: "Topcoding spread at the 90th percentile"
sum logspread if basesample, d
gen logspread_top90 = logspread
replace logspread_top90 =  r(p90) if logspread>r(p90) & logspread<.
reg logspread_top90 any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust

** Row 17: "Bottomcoding buy and sell valuations at $1000"
gen logspread_1k = abs(log((max(1000,sellpricesup)+max(1000,sellpriceinf))/2)   ///
                     - log((max(1000, buypricesup)+max(1000, buypriceinf))/2))  if logspread<.
reg logspread_1k any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust

** Row 18: "Spread set to zero if spread ≤ 0.50"
gen logspread_tol50 = logspread
replace logspread_tol50 = 0 if logspread<=0.50
reg logspread_tol50 any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust
di "A 0.50 log difference translates to a factor of: " exp(0.50)  

** Row 19: "Spread set to zero if buy valuation > sell valuation"
gen logspread_asym = logspread
replace logspread_asym = 0 if logbuyprice>logsellprice & logbuyprice<.
reg logspread_asym any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust







// ***************************************************************************************
//			APX TABLE A09: Other Predictors of the Sell-Buy Spread
// ***************************************************************************************

** Row 1: "Cognition Index (Standardized)"
reg logspread cognix_pca                                 , robust

** Row 2: "Decision-Making Competence, Framing Consistency (Standardized)"
reg logspread admc_Sframe*                               , robust

** Row 3: "Decision-Making Competence, Time Conjunction (Standardized)"
reg logspread admc_Stime*                                , robust

** Row 4: "Decision-Making Competence, Subset Consistency (Standardized)"
reg logspread admc_Ssubset*                              , robust

** Row 5: "Self-Assessed Knowledge about Social Security (Standardized)"
reg logspread ssa_Sknowledge*                            , robust

** Row 6: "Social Security Literacy Score (Standardized)"
reg logspread ssa_Sliteracy*                             , robust

** Row 7: "Confidence that Social Security will Pay Benefits (Standardized)"
reg logspread ssa_Sconfident                             , robust

** Row 8: "Receives Annuity Income (Dummy)"
reg logspread afin_annuity                               , robust

** Row 9: "Owns an IRA or Keogh (Dummy)"
reg logspread afin_irakeogh                              , robust

** Row 10: "Ability and Comfort with Retirement Planning (Standardized)"
reg logspread aplan_index                                , robust




	
// ***************************************************************************************
//			APX TABLE A10: Characteristics of People with Buy Values Exceeding Sell Values
// ***************************************************************************************

** Define a categorical variable for differences between the buy and sell value
** (It is more than we need here, but it will be used later on again)
gen logsellbuydiff_cats = .
replace logsellbuydiff_cats = 1 if logsellbuydiff <  -1 
replace logsellbuydiff_cats = 2 if logsellbuydiff >= -1 &  logsellbuydiff <0  
replace logsellbuydiff_cats = 3 if logsellbuydiff ==0
replace logsellbuydiff_cats = 4 if logsellbuydiff > 0   &  logsellbuydiff <=1
replace logsellbuydiff_cats = 5 if logsellbuydiff > 1   &  logsellbuydiff <.

label def logsellbuydiff_cats 1 "logdif<-1" 2 "-1<=logdif<0" 3 "logdif=0" 4 "0<logdif<=1" 5 "1<logdif"
label val logsellbuydiff_cats logsellbuydiff_cats 


** Summarize variables for the group with buy>sell
sum agecat_* $demographics_balance cognix_pca admc_Sframe admc_Stime admc_Ssubset ssa_Sknowledge ///
             ssa_Sliteracy ssa_Sconfident afin_annuity afin_irakeogh aplan_index ///
             if basesample & (logsellbuydiff_cats==1 | logsellbuydiff_cats==2), sep(0)   

** Summarize variables for the  group with buy<sell
sum agecat_* $demographics_balance cognix_pca admc_Sframe admc_Stime admc_Ssubset ssa_Sknowledge ///
             ssa_Sliteracy ssa_Sconfident afin_annuity afin_irakeogh aplan_index ///
             if basesample & (logsellbuydiff_cats==4 | logsellbuydiff_cats==5), sep(0)  

  
** Calculate the significance of the differences using t-tests
gen tmp=logsellbuydiff_cats
recode tmp 1 2 = 1 3=. 4 5=0    /* dummy for being in buy>sell group as opposed to sell>buy */
foreach x of varlist agecat_* $demographics_balance cognix_pca admc_Sframe admc_Stime admc_Ssubset ///
                     ssa_Sknowledge ssa_Sliteracy ssa_Sconfident afin_annuity afin_irakeogh aplan_index {
  qui ttest `x', by(tmp) unequal
  di "P-value for t-test of difference is "  %8.4f r(p) "  for `x' "
}   


** test for joint significance in a regression, to report in the last row of the table
reg tmp $demographics cognix_pca admc__Sframe* admc__Stime* admc__Ssubset* ssa__Sknowledge* ///
        ssa__Sliteracy* ssa__Sconfident* afin__annuity* afin__irakeogh* aplan__index* if basesample, robust

drop tmp


    
// ***************************************************************************************
//	 TEXT CLAIMS: Results mentioned in the text but that don't appear in tables
// ***************************************************************************************


** Text Claim 01 - Section 2.4 (Footnote #9) 	
** -----------------------------------------
** "Respondents took on average about 30% longer to read and process the vignettes 
** of the complexity treatment than the control vignette (“no added complexity”), 
** and the text of vignettes of the complexity treatment required a reading 
** comprehension 0.9 grade levels higher, according to the Flesch-Kincaid scale."
**
** Note: the comprehension grade levels were found by importing the text of the 
**       vignette into MS Word, and using the reading comprehension tool in
**       MS Word


tab complexity if basesample, sum(time_ad_intro)
reg time_ad_intro complexity_2 complexity_3 if basesample, robust 
di "Percent increase in reading time for complexity treatments:      "  50*(_b[complexity_2]+_b[complexity_3])/_b[_cons]
di "Percent increase in reading time for complexity: wide age range: " 100*_b[complexity_2]/_b[_cons]
di "Percent increase in reading time for complexity: extra info:     " 100*_b[complexity_3]/_b[_cons]




** Text Claim 02 - Section 2.4 
** ---------------------------
** "These factual questions are two multiple choice questions about the financial 
** advisor’s explanation of the benefits and drawbacks under each scenario (spending
** down slowly or quickly). Of the respondents who are posed these two factual questions, 
** 63% answer both correctly, 27% answer one correctly, and 10% answer neither correctly."

** What fraction of people correctly answered the follow-up questions to the consequential intervention?
gen correct_followup1=(test_question1==3) if consequence & basesample
gen correct_followup2=(test_question2==3) if consequence & basesample
gen correct_followup_total = correct_followup1+correct_followup2

label var correct_followup_total "Total Questions Correct (0-2)"

** As expected, the number of correct answers varies strongly by cognition
** Last column shows the percentages answering both correctly and exactly one correctly.
tab correct_followup_total cognix_xtile4 if consequence & basesample, col nof


	
	
** Text Claim 03 - Section 3.2 	
** ---------------------------
** "Respondents advise our hypothetical vignette individuals to buy an annuity 
** that pays $100 per month for a median price of $4,750 (s.e.: $180) but advise 
** them to sell this annuity for a median price of $16,250 (s.e.: $543)."
 
** summarize to get the medians and run median regressions to get standard errors 
sum  midbuy  if ~sell_first & basesample, d
qreg midbuy  if ~sell_first & basesample

sum  midsell if  sell_first & basesample, d
qreg midsell if  sell_first & basesample
   
   
   
   
** Text Claim 04 - Section 3.2 	
** ---------------------------
** "This represents a statistically significant difference (two-sample 
** Wilcoxon-Mann-Whitney rank-sum test z-statistic=25.8, p-value<0.001)."
**
** These are from different (and independent) samples because sell_first randomized
** across respondents. Hence to an exact  Wilcoxon-Mann-Whitney ranksum test.
** To implement this, the data from the different groups needs to be in the 
** same variable. Create this variable first:

gen     valuation=.
replace valuation=midbuy  if ~sell_first & basesample
replace valuation=midsell if  sell_first & basesample
ranksumex valuation, by(sell_first)
drop valuation


   
** Text Claim 05 - Section 3.2	
** ---------------------------
** "Only about 10 percent of respondents have a buy value that is equal to their 
** sell value, and only 40 percent have a buy and sell value that are within one
** log unit (i.e., within a factor of 2.72) of each other."

tab logsellbuydiff_cats if basesample



** Text Claim 06 - Section 3.2 (answered by same output as text claim 05)	
** ----------------------------------------------------------------------
** "Second, the distribution is not symmetric around zero: 63% have sell valuations 
** that strictly exceed their buy valuations, whereas buy valuations strictly exceed
** sell valuations for about 27% of respondents."

tab logsellbuydiff_cats if basesample


   
** Text Claim 07 - Section 3.2	
** ---------------------------
** "A further similarity is that we also find that the log buy and the log sell 
** valuations are negatively correlated (correlation coefficient: -0.11, p-value<0.001)."

pwcorr logbuyprice logsellprice if basesample, sig




** Text Claim 08 - Section 3.3 (Footnote #15)	
** ------------------------------------------
** "We do not control for the order in which the two blocks of consequence message
** treatment were shown because this variable is available for only half the sample. 
** Within the half of the sample for which this order was randomized, the order 
** has no significant effect on the spread (p-value: 0.758)."

** There is no effect of the order of the test questions used in the consequence treatment on the spread
** Look at P>(t) value for quick_first variable
reg logspread    any_complexity quick_first consequence cognix_pca $exp_controls $demographics if basesample & consequence, robust



 
** Text Claim 09 - Section 3.3	
** ---------------------------
** "While the estimates seem to indicate that the complexity treatment primarily 
** operates on the buy price, and hence it reduces the average of the log sell and 
** buy price, this is not a valid interpretation as we cannot reject that increase
** in the sell price and the decrease in the buy price are the same in absolute 
** value (p-value: 0.302)."
 
** The effect of the treatments on the mean annuity valuation
** Look at P>(t) value for any_complexity variable
reg meanlogprice any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust




** Text Claim 10 - Section 3.3 (answered by same output as text claim 09)
** ----------------------------------------------------------------------
** "In fact, it marginally significantly increases the average of the log buy and 
** sell price (p-value 0.073), suggesting that the consequence message not only 
** increases the rationality of the annuity valuations but also raises the levels."

** Effect of the treatments on the mean annuity valuation
reg meanlogprice any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust





** Text Claim 11 - Section 3.3 (Footnote #17)	
** ------------------------------------------
** "One might expect that people with an initially higher Social Security benefit 
** place a lower value on a $100 change in Social Security benefits, since they are 
** already more highly annuitized. To test this, we run an alternative specification 
** in which the baseline Social Security benefit amount is included as a linear control
** instead of as a set of dummy variables. Both the buy and sell value decline in the 
** baseline amount of Social Security benefits. The effect is not significant for the sell
** value (p-value 0.145), but there is a significant 2.5% decline in the buy value for 
** each additional $100 in baseline Social Security benefits."

** Effect of ss_benefit100dollar on the log spread
reg logspread    any_complexity consequence cognix_pca $exp_controls_linben $demographics if basesample, robust

** Effect of ss_benefit100dollar on the log sell price
reg logsellprice any_complexity consequence cognix_pca $exp_controls_linben $demographics if basesample, robust

** Effect of ss_benefit100dollar on the log buy price
reg logbuyprice  any_complexity consequence cognix_pca $exp_controls_linben $demographics if basesample, robust



** Text Claim 12 - Section 3.3	
** ---------------------------
** "To alleviate concerns about multiple hypotheses testing, we also test whether 
** our two key experimental manipulations, the consequence message and the complexity
** treatment, are jointly zero: we reject this hypothesis with a p-value of 0.0106. 
** The p-value becomes 0.0256 if we do not pool the complexity treatment, i.e., when 
** we test that the consequence message, the wide-age-range complexity treatment, and
** the extra-information complexity treatment are jointly zero. If we include all the 
** secondary experimental manipulations in the joint test, we can reject that all 
** treatment effects are jointly zero with a p-value of 0.0098 when the complexity 
** treatments are pooled and with a p-value of 0.0148 when the complexity treatments
** are separated out."

** -------- Pooling Complexity Treatments ---------
** Re-run regression from table 3 for column 1 so that testparm command can be run
reg logspread    any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust

** Joint test of key treatments
testparm any_complexity consequence

** Joint test of all treatments including secondary ones (even though the secondary ones were not expected to be significant)
testparm any_complexity consequence $exp_controls

** -------- Not Pooling Complexity Treatments -------------
** Re-run regression from APX table A06 for column 1 so that testparm command can be run on this data
reg logspread complexity_2 complexity_3 consequence cognix_pca $exp_controls $demographics if basesample, robust

** Joint test of key treatments
testparm complexity_2 complexity_3 consequence

** Joint test of all treatments including secondary ones (even though the secondary ones were not expected to be significant)
testparm complexity_2 complexity_3 consequence $exp_controls





** Text Claim 13 - Section 3.3	
** ---------------------------
** "What would annuity valuations be if we had an intervention sufficiently powerful
** to cause the mean log sell price and the mean log buy price to be equal (so no 
** deviation from rationality at the mean)? We can get a rough answer to this question 
** by extrapolating the effects of each of our two main experimental interventions. 
** The mean log difference between sell and buy price is 1.01 (see Figure 2), and the
** consequence message moves log sell and buy price closer by 0.122 (=0.133-0.011, see 
** columns 2 and 3 of Table 3). Thus, a treatment about 8 ≈ 1.01/0.122 times more powerful 
** than our current consequence message would close the gap between the mean log sell and buy 
** price. At that level of treatment, the median sell and buy price would be predicted to
** be about $17,000. Similarly, we can extrapolate the complexity treatment, in the direction
** of making the problem less complex, such that the sell and buy price coincide. This would 
** require reducing complexity by about 5 times the amount of complexity added by our 
** complexity treatment. The resulting sell and buy price would then be predicted to be
** about $12,000."


** 1. Predict buy and sell values at CONSEQUENCE level that causes log sell and by prices to be equal
** Run auxiliary regression of the log buy - sell difference (not in absolute values):

** Generate mean_logsellbuydiff global 
sum logsellbuydiff if basesample
global mean_logsellbuydiff=r(mean)

reg logsellbuydiff    any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust
sum consequence if basesample
global required_consequence = r(mean) -  $mean_logsellbuydiff/_b[consequence] 

di "Level of consequence where log buy and sell prices are equal: " $required_consequence

** Now predict logbuyprice at the "required" level of consequence
reg logbuyprice   any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust
gen logbuyprice_reqconsequence = logbuyprice + _b[consequence]*($required_consequence - consequence) if basesample
gen    buyprice_reqconsequence = exp(logbuyprice_reqconsequence)

sum logbuyprice                if basesample, d
sum logbuyprice_reqconsequence if basesample, d

di "The exp of log buy price that equals log sell price is: " exp(r(mean))

sum midbuy                  if basesample, d
sum buyprice_reqconsequence if basesample, d

** Now predict logsellprice at the "required" level of consequence
reg logsellprice   any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust
gen logsellprice_reqconsequence = logsellprice + _b[consequence]*($required_consequence - consequence) if basesample
gen    sellprice_reqconsequence = exp(logsellprice_reqconsequence)

sum logsellprice                if basesample, d
sum logsellprice_reqconsequence if basesample, d

di "The exp of log sell price that equals log buy price is: " exp(r(mean))

sum midsell                  if basesample, d
sum sellprice_reqconsequence if basesample, d


** 2. Predict buy and sell values at COMPLEXITY level that causes log sell and by prices to be equal
** Run auxiliary regression of the log buy - sell difference (not in absolute values):
reg logsellbuydiff    any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust
sum any_complexity if basesample
global required_complexity = r(mean) -  $mean_logsellbuydiff/_b[any_complexity]
di "Level of complexity where log buy and sell prices are equal: " $required_complexity

** Now predict logbuyprice at the "required" level of complexity
reg logbuyprice   any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust
gen logbuyprice_reqcomplex = logbuyprice + _b[any_complexity]*($required_complexity - any_complexity) if basesample
gen    buyprice_reqcomplex = exp(logbuyprice_reqcomplex)

sum logbuyprice            if basesample, d
sum logbuyprice_reqcomplex if basesample, d

di "The exp of log buy price that equals log sell price is: " exp(r(mean))

sum midbuy              if basesample, d
sum buyprice_reqcomplex if basesample, d

** Now predict logsellprice at the "required" level of complexity
reg logsellprice   any_complexity consequence cognix_pca $exp_controls $demographics if basesample, robust
gen logsellprice_reqcomplex = logsellprice + _b[any_complexity]*($required_complexity - any_complexity) if basesample
gen    sellprice_reqcomplex = exp(logsellprice_reqcomplex)

sum logsellprice            if basesample, d
sum logsellprice_reqcomplex if basesample, d

di "The exp of log sell price that equals log buy price is: " exp(r(mean))

sum midsell              if basesample, d
sum sellprice_reqcomplex if basesample, d



** Text Claim 14 - Appendix "Discussion of Robustness" 
** ---------------------------------------------------

** "This implies that there is an implicit topcode of $100,000 on the buy valuations,
** though respondents are permitted to give a buy recommendation at a price in excess
** of $100,000, and 9% of them do so."

** For mentioning in the text: the fraction of buy valuations exceeding $100k
gen topcodebuy = midbuy>100000 if basesample
tab topcodebuy
	
	

log close






